Parallel Triangular Sylvester-Type Matrix Equation Solvers for SMP Systems Using Recursive Blocking
نویسندگان
چکیده
We present recursive blocked algorithms for solving triangular Sylvester-type matrix equations. Recursion leads to automatic blocking that is variable and \squarish". The main part of the computations are performed as level 3 general matrix multiply and add (GEMM) operations. We also present new highly optimized superscalar kernels for solving small-sized matrix equations stored in level 1 cache. Hereby, a larger part of the total execution time will be spent in GEMM operations. In turn, this leads to much better performance, especially for small to medium-sized problems, and improved parallel scalability on shared memory processor (SMP) systems. Uniprocessor and SMP parallel performance results are presented and compared with results from existing LAPACK routines for solving this type of matrix equations. Today block-based algorithms are the standard in state-of-the-art software for dense linear algebra computations. These algorithms perform most of their computations in calls to high-performance level 3 BLAS. The level 3 BLAS obtain most of their performance by data rearrangements and the use of high-performance atomic kernel routines. The current state-of-the-art BLAS are designed to exploit the architecture design while maintaining the functionality of the BLAS and thereby guarantee high performance and portability of dense linear algebra codes 5, 6, 4]. The level 3 factorization algorithms typiied by LA-PACK make repeated calls to the level 3 BLAS with matrix operands equal to submatrices of xed size. This results in multiple data copying on operands that are related. How can this be challenged? An answer is to change the data structure of the BLAS input matrices to reeect this relationship, thereby removing any data copying from the BLAS. The obvious choice is to store matrices as blocks, i.e., submatrices of block-partioned matrices, instead of the standard column-major (Fortran) or row-major (C) orderings. Recursion is a key concept for matching an algorithm and its data structure. A recursive algorithm leads to automatic blocking which is variable and \squar-ish" 2]. This layered and variable blocking allow for good data locality, which
منابع مشابه
Combining Explicit and Recursive Blocking for Solving Triangular Sylvester-Type Matrix Equations on Distributed Memory Platforms
Parallel ScaLAPACK-style hybrid algorithms for solving the triangular continuous-time Sylvester (SYCT) equation AX − XB = C using recursive blocked node solvers from the novel high-performance library RECSY are presented. We compare our new hybrid algorithms with parallel implementations based on the SYCT solver DTRSYL from LAPACK. Experiments show that the RECSY solvers can significantly impro...
متن کاملCombining Explicit, Recursive Blocking for Solving Triangular Sylvester-Type Matrix Equations on Distributed Memory Platforms
Parallel ScaLAPACK-style hybrid algorithms for solving the triangular continuous-time Sylvester (SYCT) equation AX − XB = C using recursive blocked node solvers from the novel high-performance library RECSY are presented. We compare our new hybrid algorithms with parallel implementations based on the SYCT solver DTRSYL from LAPACK. Experiments show that the RECSY solvers can significantly impro...
متن کاملRECSY - A High Performance Library for Sylvester-Type Matrix Equations
RECSY is a library for solving triangular Sylvester-type matrix equations. Its objectives are both speed and reliability. In order to achieve these goals, RECSY is based on novel recursive blocked algorithms, which call high-performance kernels for solving small-sized leaf problems of the recursion tree. In contrast to explicit standard blocking techniques, our recursive approach leads to an au...
متن کاملParallel Algorithms and Condition Estimators for Standard and Generalized Triangular Sylvester-Type Matrix Equations
We discuss parallel algorithms for solving eight common standard and generalized triangular Sylvester-type matrix equation. Our parallel algorithms are based on explicit blocking, 2D block-cyclic data distribution of the matrices and wavefront-like traversal of the right hand side matrices while solving small-sized matrix equations at different nodes and updating the rest of the right hand side...
متن کاملRecursive Blocked Algorithms for Solving Periodic Triangular Sylvester-Type Matrix Equations
Recently, recursive blocked algorithms for solving triangular one-sided and two-sided Sylvester-type equations were introduced by Jonsson and K̊agström. This elegant yet simple technique enables an automatic variable blocking that has the potential of matching the memory hierarchies of today’s HPC systems. The main parts of the computations are performed as level 3 general matrix multiply and ad...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2000